DESIGN BASED INCOMPLETE U-STATISTICS

نویسندگان

چکیده

U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, have an obvious drawback that the computation becomes impractical data size $n$ increases. Specifically, number of combinations, say $m$, a U-statistic order $d$ has to evaluate is $O(n^d)$. Many efforts been made approximate original using small subset combinations since Blom (1976), who referred approximation incomplete U-statistic. To best our knowledge, all existing methods require $m$ grow at least faster than $n$, albeit more slowly $n^d$, for corresponding be asymptotically efficient terms mean squared error. In this paper, we introduce new type can efficient, even when grows $n$. some cases, only required $\sqrt{n}$. Our theoretical empirical results both show significant improvements efficiency

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling-up Empirical Risk Minimization: Optimization of Incomplete $U$-statistics

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by U-statistics of degree d ≥ 1, i.e. functionals of the training data with low variance that take the form of averages over k-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample siz...

متن کامل

Incomplete generalized U-statistics for food risk assessment.

This article proposes statistical tools for quantitative evaluation of the risk due to the presence of some particular contaminants in food. We focus on the estimation of the probability of the exposure to exceed the so-called provisional tolerable weekly intake (PTWI), when both consumption data and contamination data are independently available. A Monte Carlo approximation of the plug-in esti...

متن کامل

U-Statistics Based on Spacings

In this paper, we investigate the asymptotic theory for U -statistics based on sample spacings, i.e. the gaps between successive observations. The usual asymptotic theory for U -statistics does not apply here because spacings are dependent variables. However, under the null hypothesis, the uniform spacings can be expressed as conditionally independent Exponential random variables. We exploit th...

متن کامل

SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e.g., pairs or triplets) of observations, rather than over individual observations. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the ...

متن کامل

Lecture 4: U-Statistics & U-Process Minimizers

Hoeffding (1948a) developed the basic theory of U-Statistics, a family of estimates which includes many familiar and interesting examples. This lecture reviews this theory. Standard references for the material presented here include Serfling (1980, Chapter 5), Lehmman (1999, Chapter 6) and van der Vaart (1998, Chapters 11 & 12). The basic theory of U-Statistics allows for a presentation of larg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistica Sinica

سال: 2021

ISSN: ['1017-0405', '1996-8507']

DOI: https://doi.org/10.5705/ss.202019.0098